Calibrate retirement contributions: targets, SS reconciliation, and QRF imputation by PavelMakarchuk · Pull Request #554 · PolicyEngine/policyengine-us-data

PavelMakarchuk · 2026-02-25T22:12:13Z

Summary

Calibrates retirement contribution variables in the CPS/Extended CPS pipeline with three major changes:

Proportional RETCB_VAL allocation: Splits CPS retirement contributions into traditional/Roth 401(k) and traditional/Roth IRA using admin-data shares from imputation_parameters.yaml (dc_share=0.908, ira_share=0.092, roth_dc_share=0.15, trad_ira_share=0.392)
QRF imputation for PUF clone half: Trains a Quantile Random Forest on the CPS half to predict retirement contributions onto the PUF clone half, ensuring income-contribution consistency (e.g., $0-wage PUF clones get $0 in 401(k) contributions)
Calibration targets: Adds national calibration targets for traditional 401(k) ($482.7B), Roth 401(k) ($85.2B), traditional IRA ($13.2B), Roth IRA ($35.0B), and SE pension ($29.5B)

Additional changes

SE pension cap for dual-income filers: Caps SE pension at min(25% of SE income, IRS dollar limit) so dual-income filers retain a remainder for 401(k)/IRA. Previously all of RETCB_VAL went to SE pension for anyone with SE income.
IRA gate fix: IRA pool now available to anyone with earned income (wages or SE), not just wage earners. SE-only filers can now receive IRA contributions.
Dataset integrity tests: Ensures formula variables are not stored in dataset files, preventing stale-value bugs
policyengine-us upgrade: Updates to v1.587.0
conftest.py fix: Fixes microimpute mocking to use try/except ImportError instead of checking sys.modules, preventing MagicMock from replacing the real QRF during pytest
Retirement limits consolidated: Moved contribution limits from duplicated dicts in cps.py / puf_impute.py to single source in imputation_parameters.yaml
Exception logging improved: _qrf_ss_shares and _impute_retirement_contributions now log full tracebacks on failure
Redundant conftest removed: Inner test_calibration/conftest.py mock removed (root conftest propagates)
Validation script de-duplicated: Imports targets from loss.py and limits from YAML instead of hardcoding

Test plan

35 unit tests for retirement QRF imputation (test_retirement_imputation.py)
Dataset integrity tests (test_no_formula_variables_stored.py)
CI passes all existing tests (conftest fix resolves test_cps_has_auto_loan_interest / test_cps_has_net_worth failures)
Post-build validation script (validation/validate_retirement_imputation.py) confirms aggregates match targets

🤖 Generated with Claude Code

baogorek

Looks good, just please update this one bullet point in the PR body (if this is indeed correct):

Says:

Remove roth_ira_contributions target ($39B) — structurally $0 due to CPS allocation bug where traditional IRA always exhausts the IRA limit first

But:
code actually keeps roth_ira_contributions and updates it from $39B to $35.0B in both loss.py and etl_national_targets.py (which it can do be

PavelMakarchuk · 2026-03-03T10:52:40Z

@baogorek rewrote the PR fairly substantially since your last review. My first attempt at this so please be as nit-picky as needed.

Replace hard-coded RETIREMENT_LIMITS dict with a shared utility that reads from policyengine-us's parameter tree at runtime. This ensures limits stay in sync as policyengine-us is updated, and eliminates a maintenance risk when the same dict is duplicated in puf_impute.py (PR #554). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

baogorek

@PavelMakarchuk In full transparency, my review was focused on how disruptive this PR would be to #538, which fortunately is not at all. Looking at the printout of items I got from Claude, it looks like there are some duplication issues that are not that serious.

Point 5 about the subsampling is interesting. I believe this is where @juaristi22 found that 40,000 was where diminishing returns started to kick in. But I'm reading that this is new code, and if you found that it works well, it's not a blocker.

I'm going to go ahead and approve, but please consider these points before you merge:

Retirement limits duplicated between cps.py and puf_impute.py — _get_retirement_limits() is a copy-paste of the
limits dict in cps.py with the comment "duplicated here to avoid circular imports." These should be extracted to a
shared location — imputation_parameters.yaml already holds the allocation shares and would be a natural home for the
limits too. Two copies will inevitably drift.
_qrf_ss_shares swallows exceptions silently — The except Exception block returns None without logging the
exception or traceback. _impute_retirement_contributions at least logs %s, e, but neither logs exc_info=True. When
these fail in production you'll have no idea why.
TestLimitsMatchCps doesn't actually cross-check against cps.py — Despite the name, this test just re-calls
_get_retirement_limits() with hardcoded values. A real cross-check would import the CPS limits and assert equality —
which would also catch the duplication problem from point 1.
Duplicate conftest.py — Both root conftest.py and policyengine_us_data/tests/test_calibration/conftest.py mock
microimpute identically. Root conftest propagates downward in pytest, so the inner one is redundant. Note: the
calibration test conftest on my branch has different fixtures (db_uri, dataset_path), so we'll have a merge conflict
— easy to resolve by combining both into one file, but worth being aware of.
QRF training subsample of 5000 is undocumented — CPS has ~200K person records. Why 5000? If it's for speed, a
comment explaining the tradeoff would help. If arbitrary, a larger sample would improve predictions.
SE + wages people get $0 in 401(k)/IRA — In the proportional RETCB_VAL allocation, if someone has both SE income
and wages, the entire amount goes to self_employed_pension_contributions first, leaving remaining = 0. Many people
have W-2 jobs plus side gigs. Worth at least a comment acknowledging this as a known limitation.
Validation script hardcodes targets — validation/validate_retirement_imputation.py re-declares all target values
instead of importing NATIONAL_CALIBRATION_TARGETS from loss.py. These will drift.

Worth discussing:

SE + wages people get $0 in 401(k)/IRA — In the proportional RETCB_VAL allocation, if someone has both SE income
and wages, the entire amount goes to self_employed_pension_contributions first, leaving remaining = 0. Many people
have W-2 jobs plus side gigs. Worth at least a comment acknowledging this as a known limitation.
Validation script hardcodes targets — validation/validate_retirement_imputation.py re-declares all target values
instead of importing NATIONAL_CALIBRATION_TARGETS from loss.py. These will drift.

Looks good:

_drop_formula_variables is a solid cleanup — my stacked dataset builder already follows this pattern (filtering to
input_variables), so making ExtendedCPS generation consistent is welcome.
Calibration target sourcing is well-documented.
The conftest try/except ImportError fix for the microimpute mock is the right approach.

* Read retirement limits from policyengine-us parameters Replace hard-coded RETIREMENT_LIMITS dict with a shared utility that reads from policyengine-us's parameter tree at runtime. This ensures limits stay in sync as policyengine-us is updated, and eliminates a maintenance risk when the same dict is duplicated in puf_impute.py (PR #554). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fixup! Read retirement limits from policyengine-us parameters * Revert test formatting to match CI Black version The fixup reformatted SQL strings in test_database_build.py using a newer Black version that disagrees with CI's lgeiger/black-action. Revert to the original formatting that CI accepts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add tests for retirement limits utility Verifies get_retirement_limits returns correct IRS contribution limits for 2020, 2023, and 2025 by comparing against known values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Handle both old and new policyengine-us catch-up parameter layouts The SECURE 2.0 update (merged to pe-us main, not yet released) renames children["401k"] to children["k401"] and changes it from a simple int to an age-bracketed scale. Handle both layouts so the code works with the current release and future releases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Use k401 parameter directly, bump policyengine-us>=1.572.5 Drop dual-path handling in favor of the current parameter layout (k401 scale with age brackets from SECURE 2.0). Bump the minimum policyengine-us version to 1.572.5 which introduced this parameter. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

MaxGhenis · 2026-03-04T13:37:02Z

I've merged #566 which extracts the retirement limits into a shared utility (utils/retirement_limits.py) that reads from policyengine-us parameters at runtime. When you rebase onto main, you can replace the duplicated limits dict in puf_impute.py with an import from that utility — addresses Ben's point 1 cleanly.

After rebasing, this should be good to merge once CI passes. Nice work on the QRF imputation approach.

MaxGhenis · 2026-03-04T13:38:25Z

Update: tried to rebase this for you but the branch is based on a very old commit (217 commits behind main). The cherry-pick/rebase hits conflicts because several files have been restructured since then (puf_impute.py, etl_national_targets.py, cps.py, loss.py, etc.).

You'll need to rebase locally — you know the retirement code best so the conflict resolution will be straightforward for you. After rebasing, you can also import from policyengine_us_data.utils.retirement_limits (from #566) instead of duplicating the limits dict.

- Correct traditional_ira_contributions: $25B → $13.2B (SOI 1304 Table 1.4 actual deduction, not total contributions) - Add traditional_401k_contributions: $567.9B (BEA/FRED employee DC contributions) - Add self_employed_pension_contribution_ald: $29.5B (SOI 1304 Table 1.4 Keogh plan deduction) - Remove roth_ira_contributions: structurally $0 due to CPS allocation bug (#553) - Update both loss.py and etl_national_targets.py Closes #553 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace sequential waterfall with proportional allocation based on administrative data. The old waterfall gave 401(k) first priority, consuming all of RETCB_VAL and leaving IRA contributions at $0 for every record. The Roth IRA allocation was also mathematically guaranteed to produce $0. The new approach splits RETCB_VAL proportionally: - DC vs IRA: 90.8% / 9.2% (BEA/FRED vs IRS SOI) - Within DC: 85% traditional / 15% Roth (Vanguard/PSCA) - Within IRA: 39.2% traditional / 60.8% Roth (IRS SOI Tables 5 & 6) All fractions are stored in imputation_parameters.yaml with sources. Contribution limits are still enforced. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Split 401(k) target: $567.9B total employee DC deferrals (BEA/FRED) into traditional $482.7B (85%) and Roth $85.2B (15%) using Vanguard How America Saves 2024 dollar share estimate - Add roth_ira_contributions target: $35.0B from IRS SOI Accumulation Tables 5 & 6 (TY 2022) — direct administrative source - Update etl_national_targets.py in parallel Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When PUF imputation replaces social_security values, the sub-components (retirement, disability, survivors, dependents) were left unchanged, creating a mismatch. This caused a base-year discontinuity where projected years had ~3x more SS recipients than the base year, producing artificial 9-point Gini swings. The new reconcile_ss_subcomponents() function rescales sub-components proportionally after PUF imputation. New recipients (CPS had zero SS but PUF imputed positive) default to retirement. Fixes #551 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

New PUF recipients (CPS had zero SS) now get assigned based on age: >= 62 -> retirement, < 62 -> disability. Matches the CPS fallback logic. Falls back to retirement if age is unavailable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Instead of the simple age >= 62 heuristic, train a QRF on CPS records (where the reason-code split is known) to predict shares for new PUF recipients. Uses age, gender, marital status, and other demographics. Falls back to age heuristic when microimpute is unavailable or training data is insufficient (< 100 records). 14 tests covering: - Proportional rescaling (existing recipients) - Age heuristic fallback (4 tests) - QRF share prediction (4 tests including sum-to-one and elderly-predicted-as-retirement) - Edge cases (zero imputation, missing subs, no SS) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CPS-PUF link is statistical (not identity-based), so the paired CPS record's sub-component split is just one noisy draw. A QRF trained on all CPS SS recipients gives a better expected prediction. Also removes unnecessary try/except ImportError guard for microimpute. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Variables with formulas in policyengine-us are recomputed by the simulation engine, so storing them wastes space and can mislead validation. This removes 9 such variables from the extended CPS output (saving ~15MB). Also adds tests verifying: - No formula variables are stored (except person_id) - Stored input values match what the simulation computes - SS sub-components sum to total social_security per person Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update _drop_formula_variables and tests to catch variables that use `adds` or `subtracts` (e.g. social_security), not just explicit formulas. These are also recomputed by the simulation engine. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

PUF clones previously copied CPS retirement contributions blindly, so a record with $0 wages could have $50k in 401(k) contributions. Train a QRF on CPS data (which has realistic income-to-contribution relationships) and predict onto the PUF half using PUF-imputed income. Post-prediction constraints enforce contribution caps, zero-out rules for records with no wages/SE income, and non-negativity. - Remove traditional_ira_contributions from IMPUTED_VARIABLES - Add CPS_RETIREMENT_VARIABLES, RETIREMENT_PREDICTORS constants - Add _get_retirement_limits() with year-specific 401k/IRA caps - Add _impute_retirement_contributions() following _impute_weeks_unemployed pattern - Integrate into puf_clone_dataset() variable routing loop - Add 34 unit tests covering constraints, routing, and limits Closes #561 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add income predictors (interest, dividends, pension, SS) to the retirement contribution QRF, matching issue #561's recommendation. Split RETIREMENT_PREDICTORS into demographic and income sublists so the test side correctly sources income from PUF imputations. Also add validation/validate_retirement_imputation.py for post-build constraint checking and aggregate comparison against calibration targets. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Wrap QRF fit/predict in try/except so a crash returns zeros instead of blowing up the entire puf_clone_dataset build. Document the pre_tax_contributions inconsistency (OVERRIDDEN vs CPS-trained sub-components) for future reconciliation. Tests: add test_qrf_failure_returns_zeros and test_training_data_failure_returns_zeros (50 total). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The conftest.py files used `if "microimpute" not in sys.modules` to decide whether to install a MagicMock. On CI (Python 3.13, where microimpute is fully installed), this condition was True at pytest startup because no code had imported microimpute yet. The mock replaced the real package, so tests that triggered CPS generation (add_rent → QRF imputation) silently got a MagicMock whose __len__ returns 0, causing the "0 input values" ValueError. Fix: use try/import instead of checking sys.modules, so the mock is only installed when microimpute genuinely cannot be imported. Also restores policyengine-us to 1.587.0 (the revert in 2566ac1 was a misdiagnosis — the conftest mock was the real root cause). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Move retirement contribution limits to imputation_parameters.yaml (single source of truth for cps.py and puf_impute.py) - Add exc_info=True to QRF exception handlers for full tracebacks - Replace TestLimitsMatchCps with TestLimitsMatchYaml that actually cross-checks against the shared YAML source - Remove redundant microimpute mock from test_calibration/conftest.py (root conftest already propagates) - Document QRF training subsample of 5000 rationale - Import targets/limits in validation script from loss.py and YAML instead of hardcoding Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Previously, filers with SE income had ALL of RETCB_VAL allocated to self_employed_pension_contributions, leaving $0 for 401(k)/IRA even when they also had wages. SE-only filers also got $0 in IRA despite being eligible. Changes: - Cap SE pension at min(25% of SE income, IRS dollar limit) so dual-income filers retain a remainder for 401(k) and IRA - Gate IRA pool on has_earned_income (wages OR SE) instead of has_wages only, so SE-only filers can receive IRA contributions - IRA pool = remaining - dc_pool (absorbs all non-DC remainder) - Add SE pension rate and dollar limits to imputation_parameters.yaml (2020-2025, sourced from IRS one-participant 401k rules) - Apply matching SE pension cap in PUF QRF constraints - Add test_se_pension_capped_at_rate_times_income No change to calibration targets (independently sourced from BEA/FRED and IRS SOI). Pre-calibration aggregates should shift closer to targets since dual-income filers now contribute to all account types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Replace _get_retirement_limits() YAML loader with wrapper that merges policyengine-us params (401k/IRA) + YAML SE pension params - Remove retirement_contribution_limits block from YAML (now sourced from policyengine-us parameter tree via utils/retirement_limits.py) - Update validation script to use get_retirement_limits() directly - Update tests for new structure (no more year clamping for 401k/IRA) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

PavelMakarchuk · 2026-03-04T15:00:43Z

@baogorek Thanks for the thorough review — all 7 points have been addressed:

Retirement limits consolidated — Rebased on main which now includes PR Read retirement limits from policyengine-us parameters #566's shared get_retirement_limits() utility (reads 401k/IRA limits from policyengine-us parameter tree). SE pension params live in imputation_parameters.yaml. No more duplication.
Exception logging — Added exc_info=True to both _qrf_ss_shares and _impute_retirement_contributions exception handlers.
Cross-checking tests — Replaced TestLimitsMatchCps with TestLimitsStructure that validates keys and cross-checks against the policyengine-us parameter tree.
Redundant conftest removed — Inner test_calibration/conftest.py mock deleted; root conftest propagates.
Subsample documented — Added comment explaining the 5,000 record tradeoff (speed vs accuracy, diminishing returns above ~5K for 5 targets with 8 predictors).
SE + wages fix — This was the big one. SE pension is now capped at min(25% × SE_income, IRS dollar limit) before the proportional DC/IRA split, so dual-income filers keep the remainder for 401(k)/IRA. Also fixed the IRA gate from has_wages to has_earned_income so SE-only filers can receive IRA contributions. Applied in both cps.py (CPS half) and puf_impute.py (QRF constraints on PUF half).
Validation script de-hardcoded — Now imports targets from loss.py and limits from the shared utility.

CI: all quick checks pass (lint, lock freshness, smoke test, changelog, fork check). Full Modal build/test still running.

MaxGhenis

Thorough and well-sourced PR. Key improvements:

Proportional RETCB_VAL split (replacing waterfall that starved IRA contributions) using BEA/FRED + SOI admin data shares
QRF retirement imputation for PUF half with proper IRS limit capping and income eligibility gating
SS sub-component reconciliation (QRF + age heuristic fallback) ensuring subs sum to imputed total
Formula variable stripping from stored dataset to prevent stale values misleading validation
Calibration targets corrected with proper SOI/BEA sources (traditional IRA $25B -> $13.2B, 401(k) split added)
Comprehensive tests (1200+ lines) covering constants, limits, QRF mock behavior, routing, SS reconciliation, and dataset integrity

Already uses the shared get_retirement_limits utility from #566.

PavelMakarchuk mentioned this pull request Feb 25, 2026

Add calibration targets for retirement contributions #553

Closed

PavelMakarchuk marked this pull request as ready for review February 26, 2026 01:54

PavelMakarchuk requested review from MaxGhenis and baogorek and removed request for baogorek February 26, 2026 01:54

MaxGhenis mentioned this pull request Feb 26, 2026

Train models for CPS-only variables on PUF clone half #561

Closed

baogorek approved these changes Feb 26, 2026

View reviewed changes

PavelMakarchuk changed the title ~~Fix and add retirement contribution calibration targets~~ Calibrate retirement contributions: targets, SS reconciliation, and QRF imputation Feb 26, 2026

PavelMakarchuk mentioned this pull request Feb 26, 2026

Reconcile SS sub-components after PUF imputation #552

Closed

2 tasks

PavelMakarchuk marked this pull request as draft February 26, 2026 23:08

PavelMakarchuk marked this pull request as ready for review March 3, 2026 10:45

PavelMakarchuk requested a review from baogorek March 3, 2026 10:51

MaxGhenis mentioned this pull request Mar 3, 2026

Read retirement limits from policyengine-us parameters #566

Merged

2 tasks

baogorek approved these changes Mar 3, 2026

View reviewed changes

PavelMakarchuk and others added 10 commits March 4, 2026 14:44

Add towncrier changelog fragment

d3b1fe0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Format test file with black

6e261a5

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

MaxGhenis and others added 10 commits March 4, 2026 14:46

Fix black formatting for CI (26.1.0)

ec4dec4

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fixup! Fix black formatting for CI (26.1.0)

f20cd32

PavelMakarchuk force-pushed the calibrate-retirement-contributions branch from 942de14 to a1446c4 Compare March 4, 2026 13:53

Update uv.lock for version freshness

9551659

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

PavelMakarchuk requested a review from baogorek March 4, 2026 15:00

MaxGhenis approved these changes Mar 4, 2026

View reviewed changes

MaxGhenis merged commit 7025f6a into main Mar 4, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calibrate retirement contributions: targets, SS reconciliation, and QRF imputation#554

Calibrate retirement contributions: targets, SS reconciliation, and QRF imputation#554
MaxGhenis merged 21 commits intomainfrom
calibrate-retirement-contributions

PavelMakarchuk commented Feb 25, 2026 •

edited

Loading

Uh oh!

baogorek left a comment

Uh oh!

PavelMakarchuk commented Mar 3, 2026

Uh oh!

baogorek left a comment

Uh oh!

MaxGhenis commented Mar 4, 2026

Uh oh!

MaxGhenis commented Mar 4, 2026

Uh oh!

PavelMakarchuk commented Mar 4, 2026

Uh oh!

MaxGhenis left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

PavelMakarchuk commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Additional changes

Test plan

Uh oh!

baogorek left a comment

Choose a reason for hiding this comment

Uh oh!

PavelMakarchuk commented Mar 3, 2026

Uh oh!

baogorek left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGhenis commented Mar 4, 2026

Uh oh!

MaxGhenis commented Mar 4, 2026

Uh oh!

PavelMakarchuk commented Mar 4, 2026

Uh oh!

MaxGhenis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PavelMakarchuk commented Feb 25, 2026 •

edited

Loading